1 Maintenance

1.1 Start R session and Rstudio

You might want to continue using project from yesterday.

1.2 Load the libraries

set.seed(1234)

if (!require("tidyverse")) install.packages("tidyverse"); library("tidyverse")
## Loading required package: tidyverse
## -- Attaching packages ----------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0     v purrr   0.3.0
## v tibble  2.0.1     v dplyr   0.7.8
## v tidyr   0.8.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.3.0
## -- Conflicts -------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
if (!require("sf")) install.packages("sf"); library("sf")
## Loading required package: sf
## Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
if (!require("tmap")) install.packages("tmap"); library("tmap")
## Loading required package: tmap
if (!require("tmaptools")) install.packages("tmaptools"); library("tmaptools")
## Loading required package: tmaptools
if (!require("sjmisc")) install.packages("sjmisc"); library("sjmisc")
## Loading required package: sjmisc
## 
## Attaching package: 'sjmisc'
## The following object is masked from 'package:purrr':
## 
##     is_empty
## The following object is masked from 'package:tidyr':
## 
##     replace_na
## The following object is masked from 'package:tibble':
## 
##     add_case
if (!require("skimr")) install.packages("skimr"); library("skimr")
## Loading required package: skimr

set.seed

sjPlot

2 Data: SA2s and SEIFA

We will reuse data prepared yesterday.

2.1 Code

SA2_SEIFA <- readRDS("data/SA2_SEIFA.Rds")

3 Data: Airbnb in Melbourne

3.1 Background information

Inside Airbnb is website that describes itself as:

an independent, non-commercial set of tools and data that allows you to explore how Airbnb is really being used in cities around the world. By analyzing publicly available information about a city’s Airbnb’s listings, Inside Airbnb provides filters and key metrics so you can see how Airbnb is being used to compete with the residential housing market.

The dataset is not offically approved by Airbnb (!) but is released under open license and in the absence of other data - was used by many research projects around the world.

3.2 Reading the data

3.2.1 Code

listings <- read_csv("./data/listings.csv")
## Parsed with column specification:
## cols(
##   id = col_double(),
##   name = col_character(),
##   host_id = col_double(),
##   host_name = col_character(),
##   neighbourhood = col_character(),
##   latitude = col_double(),
##   longitude = col_double(),
##   room_type = col_character(),
##   price = col_double()
## )

3.2.2 Explanation

point to difference csv

3.3 Examine the data

3.3.1 Code

slice(listings, 1:5)
## # A tibble: 5 x 9
##      id name  host_id host_name neighbourhood latitude longitude room_type
##   <dbl> <chr>   <dbl> <chr>     <chr>            <dbl>     <dbl> <chr>    
## 1  9835 Beau~   33057 Manju     Manningham       -37.8      145. Private ~
## 2 10803 Room~   38901 Lindsay   Moreland         -37.8      145. Private ~
## 3 12936 St K~   50121 Frank & ~ Port Phillip     -37.9      145. Entire h~
## 4 15246 Larg~   59786 Eleni     Darebin          -37.8      145. Private ~
## 5 16760 Melb~   65090 Colin     Port Phillip     -37.9      145. Private ~
## # ... with 1 more variable: price <dbl>
skim(listings) %>% skimr::kable()

Skim summary statistics
n obs: 22895
n variables: 9

Variable type: character

variable missing complete n min max empty n_unique
host_name 3 22892 22895 1 35 0 5817
name 3 22892 22895 1 188 0 22448
neighbourhood 0 22895 22895 4 17 0 30
room_type 0 22895 22895 11 15 0 3

Variable type: numeric

variable missing complete n mean sd p0 p25 p50 p75 p100 hist
host_id 0 22895 22895 7.1e+07 6.5e+07 9082 1.7e+07 4.8e+07 1.1e+08 2.3e+08
id 0 22895 22895 1.9e+07 8141522.23 9835 1.3e+07 2e+07 2.5e+07 3.1e+07
latitude 0 22895 22895 -37.83 0.067 -38.22 -37.85 -37.82 -37.8 -37.48
longitude 0 22895 22895 145.01 0.13 144.48 144.96 144.98 145.01 145.84
price 0 22895 22895 148 210.88 0 71 111 165 12624

3.3.2 Explanation

3.4 Turning data spatial

3.4.1 Code

listings_sf = st_as_sf(listings, coords = c("longitude", "latitude"), crs = 4326)

3.4.2 Explanation

Look at the difference

slice(listings_sf, 1:5)
## Simple feature collection with 5 features and 7 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 144.4843 ymin: -38.22443 xmax: 145.8391 ymax: -37.4826
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 5 x 8
##      id name  host_id host_name neighbourhood room_type price
##   <dbl> <chr>   <dbl> <chr>     <chr>         <chr>     <dbl>
## 1  9835 Beau~   33057 Manju     Manningham    Private ~    60
## 2 10803 Room~   38901 Lindsay   Moreland      Private ~    35
## 3 12936 St K~   50121 Frank & ~ Port Phillip  Entire h~   159
## 4 15246 Larg~   59786 Eleni     Darebin       Private ~    50
## 5 16760 Melb~   65090 Colin     Port Phillip  Private ~    69
## # ... with 1 more variable: geometry <POINT [°]>

crs argument

http://spatialreference.org/ref/epsg/wgs-84/

3.5 Quick map

3.5.1 Code

tmap_mode("view")
## tmap mode set to interactive viewing
listings_sf %>% 
  sample_n(1000) %>% 
  qtm()

3.5.2 Explanation

sample_n

4 Mapping price - density

4.1 Exploring attributes

Map variable in your data to visual encoding on map.

4.1.1 Code

ggplot(listings_sf, aes(price)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

descr(listings_sf$price)
## 
## ## Basic descriptive statistics
## 
##  var    type label     n NA.prc mean     sd   se  md trimmed
##   dd numeric    dd 22895      0  148 210.88 1.39 111  119.75
##            range  skew
##  12624 (0-12624) 26.33

look at md and skew

listings_sf %>% 
  ggplot(aes(room_type)) + geom_bar()

frq(listings_sf$room_type)
## 
## # x <character> 
## # total N=22895  valid N=22895  mean=1.39  sd=0.52
##  
##              val   frq raw.prc valid.prc cum.prc
##  Entire home/apt 14379   62.80     62.80   62.80
##     Private room  8116   35.45     35.45   98.25
##      Shared room   400    1.75      1.75  100.00
##             <NA>     0    0.00        NA      NA
listings_sf %>% 
  ggplot(aes(room_type, price)) + geom_boxplot() + scale_y_log10()
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 21 rows containing non-finite values (stat_boxplot).

4.2 Data prepration

4.2.1 Code

Changing coordinate system

GDA_1994_Geoscience_Australia_Lambert WKID: 3112 Authority: EPSG

GDA_1994_Australia_Albers WKID: 3577 Authority: EPSG

listings_sf_proj <- st_transform(listings_sf, 3112)

SA2_SEIFA_proj <- st_transform(SA2_SEIFA, 3112)

Creating an outline

outline <- SA2_SEIFA_proj %>% st_union 

Simple map

listings_sf %>% 
  sample_n(1000) %>%
  tm_shape() + 
  tm_dots(col = "price", style = "quantile", n = 5, palette = "seq")

Density calculation

room_price_density <- listings_sf_proj %>% 
  filter(room_type == "Private room") %>% 
  smooth_map(cover = outline, 
             var = price, 
             bandwidth = 0.5, 
             breaks = c(0, 1, 5, 10, 25, 50, 100, 250, 500))
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |=================================================================| 100%
names(room_price_density)
## [1] "raster"    "iso"       "polygons"  "bbox"      "nrow"      "ncol"     
## [7] "cell.area" "bandwidth"
tm_shape(room_price_density$polygons) +
  tm_fill(col = "level", palette = "YlOrRd", 
          title = "Airbnb room price density")
tm_shape(room_price_density$iso) +
  tm_iso(col = "level", palette = "YlOrRd", 
         title = "Airbnb room price density ")
hom_density <- listings_sf_proj %>% 
  filter(room_type == "Entire home/apt") %>% 
  smooth_map(cover = outline, 
             bandwidth = 0.5, 
             breaks = c(0, 1, 5, 10, 25, 50, 100, 250, 500, 1500))
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |=================================================================| 100%
tm_shape(hom_density$polygons) +
  tm_fill(col = "level", palette = "YlOrRd", 
          title = "Airbnb house density per km2")

4.2.2 Explanation

5 Linking prices and deprication

5.1 Spatial overlay

5.1.1 Code

Link points to polygons aka spatial join

listings_sf_SEIFA <- st_join(listings_sf_proj, SA2_SEIFA_proj, join = st_intersects)
slice(listings_sf_SEIFA, 1:5)
## Simple feature collection with 5 features and 20 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 926788.1 ymin: -4342990 xmax: 1048013 ymax: -4258184
## epsg (SRID):    3112
## proj4string:    +proj=lcc +lat_1=-18 +lat_2=-36 +lat_0=0 +lon_0=134 +x_0=0 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
##      id                                             name host_id
## 1  9835                           Beautiful Room & House   33057
## 2 10803         Room in Cool Deco Apartment in Brunswick   38901
## 3 12936 St Kilda 1BR APT+BEACHSIDE+VIEWS+PARKING+WIFI+AC   50121
## 4 15246                 Large private room-close to city   59786
## 5 16760                 Melbourne BnB near City & Sports   65090
##       host_name neighbourhood       room_type price SA2_MAIN16
## 1         Manju    Manningham    Private room    60  207021156
## 2       Lindsay      Moreland    Private room    35  206011106
## 3 Frank & Vince  Port Phillip Entire home/apt   159  206051133
## 4         Eleni       Darebin    Private room    50  206021112
## 5         Colin  Port Phillip    Private room    69  206051134
##       SA2_NAME16 SA4_CODE16             SA4_NAME16 IRSD_s IRSD_d IRSAD_s
## 1        Bulleen        207 Melbourne - Inner East   1047      7    1053
## 2 Brunswick East        206      Melbourne - Inner   1066      9    1091
## 3       St Kilda        206      Melbourne - Inner   1058      8    1083
## 4      Thornbury        206      Melbourne - Inner   1038      7    1057
## 5  St Kilda East        206      Melbourne - Inner   1066      8    1086
##   IRSAD_d IER_s IER_d IEO_s IEO_d   URP                  geometry
## 1       8  1041     8  1054     8 11045 POINT (981421.9 -4291024)
## 2       9   946     2  1141    10 10962   POINT (971651 -4289471)
## 3       9   919     2  1137    10 26124 POINT (970445.7 -4299815)
## 4       8   965     3  1099     9 18466 POINT (972473.9 -4288699)
## 5       9   939     2  1141    10 16417 POINT (971710.3 -4300461)

5.1.2 Explanation

5.2 Relationship

listings_sf_SEIFA %>% 
  st_drop_geometry() %>% 
  filter(room_type == "Private room") %>% 
  filter(!is.na(IRSAD_d)) %>% 
  arrange(IRSAD_d) %>% 
  group_by(IRSAD_d) %>% 
  descr(price)
## 
## ## Basic descriptive statistics 
## 
## Grouped by:
## IRSAD_d: 1 
##    var    type label   n NA.prc  mean    sd  se md trimmed        range
##  price numeric price 108      0 63.75 57.17 5.5 49   53.59 478 (22-500)
##  skew
##  4.92
## 
## Grouped by:
## IRSAD_d: 2 
##    var    type label  n NA.prc mean    sd   se md trimmed        range
##  price numeric price 86      0 58.8 41.95 4.52 46   50.67 258 (12-270)
##  skew
##  2.64
## 
## Grouped by:
## IRSAD_d: 3 
##    var    type label   n NA.prc  mean    sd   se md trimmed        range
##  price numeric price 200      0 63.02 49.36 3.49 49   53.31 366 (19-385)
##  skew
##  3.76
## 
## Grouped by:
## IRSAD_d: 4 
##    var    type label   n NA.prc  mean    sd   se md trimmed        range
##  price numeric price 188      0 60.22 46.86 3.42 53   54.85 596 (19-615)
##  skew
##  9.07
## 
## Grouped by:
## IRSAD_d: 5 
##    var    type label   n NA.prc mean    sd   se md trimmed        range
##  price numeric price 343      0 69.8 50.19 2.71 54   59.02 297 (19-316)
##  skew
##  2.46
## 
## Grouped by:
## IRSAD_d: 6 
##    var    type label   n NA.prc  mean     sd    se md trimmed
##  price numeric price 677      0 91.28 358.38 13.77 60   61.81
##          range skew
##  9000 (0-9000)   23
## 
## Grouped by:
## IRSAD_d: 7 
##    var    type label    n NA.prc  mean     sd   se md trimmed
##  price numeric price 1450      0 81.66 112.99 2.97 64   65.69
##           range  skew
##  1985 (15-2000) 11.18
## 
## Grouped by:
## IRSAD_d: 8 
##    var    type label   n NA.prc  mean     sd  se md trimmed         range
##  price numeric price 949      0 75.48 104.66 3.4 60   62.31 2500 (0-2500)
##   skew
##  15.92
## 
## Grouped by:
## IRSAD_d: 9 
##    var    type label    n NA.prc  mean    sd   se md trimmed
##  price numeric price 2377      0 77.47 65.98 1.35 60   65.87
##           range skew
##  1285 (19-1304) 7.01
## 
## Grouped by:
## IRSAD_d: 10 
##    var    type label    n NA.prc  mean  sd   se md trimmed         range
##  price numeric price 1722      0 88.25 205 4.94 71   71.75 8000 (0-8000)
##   skew
##  33.81
listings_sf_SEIFA %>% 
  st_drop_geometry() %>% 
  filter(room_type == "Private room") %>% 
  filter(!is.na(IRSAD_d)) %>% 
  ggplot(aes(as.factor(IRSAD_d), price)) + 
  geom_boxplot() + scale_y_log10()
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 5 rows containing non-finite values (stat_boxplot).

6 Saving your data

saveRDS(listings_sf, "data/listings_sf.Rds")

7 Further topics

  1. Try changing st_make_grid(n = 50, square = FALSE) to st_make_grid(n = 50, square = TRUE). What results do you get? Does the pattern change a lot? What are advantages or disatvantages of these two representations?

  2. Read help file of smooth_map. Try increasing bandwidth parameter.

  3. If room done - then hose or vice versa?

8 Resources